Skip to content

feat(rw-backend): site index rebuild (catalog/S3 → DB) via scan + worker queue#71

Merged
yumike merged 1 commit into
mainfrom
feat/ownership-rebuild-queue
Jun 24, 2026
Merged

feat(rw-backend): site index rebuild (catalog/S3 → DB) via scan + worker queue#71
yumike merged 1 commit into
mainfrom
feat/ownership-rebuild-queue

Conversation

@yumike

@yumike yumike commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Splits the doc-comment inbox's ownership rebuild into a dedicated, reworked subsystem: a site index that keeps a fresh per-site DB projection of RW docs sites (structure + ownership), so the backend can answer queries without hitting the catalog/S3 per request. Replaces the single long-running job with a Backstage-catalog-style producer/queue/worker.

Tables (PostgreSQL; sqlite in dev)

  • section_ownership — sparse section→entity claim links, from the catalog scan
  • sections / pages — per-site structure registries, from S3 (listSections/listPages)
  • site_refresh — work queue (next_update_at = due-time + claim lease)

Pipeline

  • rw-site-index-scan (global): catalog → per-site atomic section_ownership swap + queue upsert; prune only after a clean, fully-successful scan.
  • rw-site-index-worker (local): claim due sites (FOR UPDATE SKIP LOCKED + lease) → load from S3 → swap sections/pages; content-hash short-circuit; p-limit concurrency; per-site error isolation.

Also: shared iterateAnnotatedEntities in rw-common (search collator refactored to consume it); rw.siteIndex.{schedule,worker} config; info/debug logging.

Scope / follow-ups

  • Read-time ownership roll-up + the inbox read endpoint/UI land in a follow-up PR.
  • Requires @rwdocs/core >= 0.1.28 (carries RwSite.listSections/listPages).

Notes for reviewers

  • New code: all tests/lint/typecheck green. The full rw-backend suite's pre-existing comments/router integration suites can time out under heavy local load (unrelated to this PR).

🤖 Generated with Claude Code

…ker queue

Maintain a fresh per-site DB projection of RW documentation sites so the backend
can answer queries (starting with the doc-comment inbox) without hitting the
catalog or S3 per request. Replaces a single long-running rebuild with a
resilient producer/queue/worker model (the pattern Backstage's catalog uses).

Tables (PostgreSQL; sqlite in dev):
- section_ownership: sparse section→entity claim links, from the catalog scan
- sections / pages: per-site structure registries, from S3 (listSections/listPages)
- site_refresh: the work queue (next_update_at doubles as due-time + claim lease)

Pipeline:
- rw-site-index-scan (global): iterate rwdocs.org/ref-annotated catalog entities;
  per site, atomically swap section_ownership links + upsert the queue row; prune
  only after a clean, fully-successful scan.
- rw-site-index-worker (local): claim due sites (FOR UPDATE SKIP LOCKED + lease),
  load each from S3, swap sections/pages registries; skip the write when a content
  hash is unchanged; bounded concurrency via p-limit; per-site errors isolated.

Also: shared iterateAnnotatedEntities in rw-common (search collator refactored to
consume it); rw.siteIndex.{schedule,worker} config; info/debug logging. Read-time
ownership roll-up and the inbox endpoint/UI land in a follow-up.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@yumike yumike force-pushed the feat/ownership-rebuild-queue branch from 36c872f to b9ca7ff Compare June 24, 2026 10:29
@yumike yumike merged commit bc4220e into main Jun 24, 2026
1 check passed
@yumike yumike deleted the feat/ownership-rebuild-queue branch June 24, 2026 10:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant